Using explosive percolation in analysis of real-world networks 
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We apply a variant of the explosive percolation procedure to large real-world networks, and show 
with finite-size scaling that the university class, ordinary or explosive, of the resulting percolation 
transition depends on the structural properties of the network as well as the number of unoccupied 
links considered for comparison in our procedure. We observe that in our social networks, the per- 
colation clusters close to the critical point are related to the community structure. This relationship 
is further highlighted by applying the procedure to model networks with pre-defined communities. 



PACS numbers: 89.75.Fb,64.60.ah,89.75.Hc,89.75.Da 
I. INTRODUCTION 



The percolation process realized by the Achlioptas pro- 
cedure 1 is different from classical percolation. This "ex- 
plosive percolation" begins with a graph of isolated nodes 
and at each step, two potential edges are chosen at ran- 
dom. Then, the edge that minimizes the product or sum 
of the sizes of the two components that would be merged 
is added to the graph. This procedure eventually leads 
to an explosive percolation transition that appears dis- 
continuous (first order). However, it has recently been 
argued that in reality the transition is continuous and 
belongs to a new universality class with very small ex- 
ponent of the order parameter [2J. The above or similar 
procedures have been applied to various model networks 
ranging from regular lattices [3] to scale- free networks [4] . 
Several papers have painted an intuitive picture of the 
mechanisms behind this behavior such as local cluster 
aggregation [5] , formation of many large components be- 
fore percolation transition [6], or inhibition of growth of 
the largest cluster [7] . Other criteria for the growth pro- 
cess have also been suggested, such as choosing edges 
proportionally to a weight determined by their cluster 
sizes [8]. 

While explosive percolation has triggered a consider- 
able amount of theoretical and simulation work, its appli- 
cation to real-world networks or processes has been lim- 
ited [9] . The topological characteristics of real- world net- 
works, such as high clustering, degree correlations, com- 
munity structure, and weight-topology correlations, are 
far from those of regular or random model graphs |10j . 
Such features play a role in the characteristics of classi- 
cal percolation that has earlier been successfully applied 
to investigate real-world network structure. Here we ask 
if they also play a crucial role in explosive percolation, 
and if monitoring the percolation process itself yields im- 
portant information about the network structure. As a 
pre-requisite, we establish that proper link addition rules 
yield explosive percolation transitions when applied to 
real-world networks. However, this depends both on the 
network structure and the details of the evolution rules. 



II. DATA AND METHODS 

For our empirical networks, we have chosen a mobile 
phone call network (MPC) |TT] and a large ArXiv co- 
authorship network (CA) [T2]. Both networks are social, 
so that nodes represent people and ties their interactions, 
and are large enough for percolation studies. They also 
share features common to social networks, such as com- 
munity structure and assortativity [TO]- For the MPC, 
it has been shown that tie strengths relate to network 
topology: strong ties are associated with dense network 
neighborhoods (communities) [13] . Such weight-topology 
correlations are reflected in classical percolation behav- 
ior. For the CA, to the best of our knowledge, weight- 
topology correlations have not been studied in detail be- 
fore. 

The MPC data consists of 325 x 10 6 voice calls over 
a period of 120 days. We construct an aggregated undi- 
rected weighted network of edges with bidirectional calls 
between users, weights representing the total number of 
calls. The largest connected component (LCC) is then 
extracted, with 4.6 x 10 6 nodes and 9.1 x 10 6 edges. The 
collaboration data is from the arXiv [14] and contains all 
e-prints in "physics" until March 2010. There are 4.8 x 
10 5 article headers, from which we extract the authors. 
In the co-authorship (CA) network two authors are con- 
nected if they have co-authored articles, whose number 
determines the link weight. We then extract the LCC, 
with 1.8 x 10 5 nodes and 9.1 x 10 6 edges. In addition, 
we construct a filtered version of the CA, where arti- 
cles with more than 10 authors (~ 2% of articles) are 
ignored. This is to remove the very large cliques from 
papers with ~ 10 3 authors in fields such as hep-ex or 
astro-ph, where the principles behind collaboration net- 
work formation appear different. The LCC of the result- 
ing small collaboration co-authorship (SCA) network has 
1.5 x 10 5 nodes and 9.1 x 10 5 edges. Note that, although 
the number of nodes is not much smaller than for the 
CA, the number of edges is an order of magnitude less. 

For the percolation process, we use the min-cluster 
(MC-m) sum rule with different values of to, defined 
as follows. Initially, all the edges of the empirical net- 
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work are considered unoccupied. Then, at each time 
step, m unoccupied edges are drawn at random. Out 
of these, the edge that would minimize the size of the 
component formed if the edge were occupied is chosen. 
Intra-component edges are always favored against inter- 
component edges as they do not increase the size of any 
cluster. When comparing two inter-component edges, we 
select the one for which the sum of cluster sizes that it 
connects is minimized. Ties are resolved randomly. We 
also study the limiting case (m = oo), where all unoc- 
cupied edges are considered at each step. This leads to 
a semi-deterministic process where all intra-cluster links 
get occupied before the cluster grows in size. The only 
source of randomness is the existence of clusters of same 
size during the process [T5] . 

III. RESULTS 
A. Percolation analysis 

Let us first monitor the behavior of the order parame- 
ter, i.e. the relative size of the largest cluster, s max /7V, as 
the fraction of occupied edges /links is increased. As intra- 
cluster edges do not affect cluster growth, we consider 
the number of inter-cluster edges r instead of /ii n k s [IS]- 
We apply three variants of the MC rule: MC-2, MC-10, 
and MC-oo, as well as random link percolation for com- 
parison. Fig. [I] (a,b,c) shows the variation of the fraction 
Smax(i~)/N against the scaled number of inter-component 
edges, r/N. For all three networks, the transition of the 
order parameter is smooth for the random case, while for 
the extreme case, MC-oo, the transition appears abrupt. 
However, for MC-2 and MC-10, the situation is more 
complicated, and we study them in detail. 

To determine the nature of the transition, Achlioptas 
et al. [T] studied the dependence of the width of the tran- 
sition window on system size. This width can be quanti- 
fied as A = t(N/2) - t(VN), where t(N/2) and t(VN) 
are the lowest values of r for which s max > N/2 and 
•Smax > v^V, respectively. In general the width scales as 
a power law with the system size, A oc N^. For classical 
percolation, £ = 1. It was argued that for explosive per- 
colation £ < 1 and the rescaled width of the transition 
region, A/N oc 2V^ _1 , vanishes in the limit of large N. 
While recent results [2] argue that the transition region 
is in reality finite, the very small exponent of the order 
parameter guarantees that in practice it is vanishingly 
small even for large systems. 

For applying finite-size scaling to empirical networks, 
samples of different sizes are needed. In general, unbiased 
sampling of a network is difficult. Here, we take advan- 
tage of the known properties of our networks. Call net- 
works are geographically embedded [T7], and we extract 
subnetworks of users in chosen cities, based on postal 
codes of their subscriptions. For the co-authorship net- 
works, we extract sub-networks of authors with articles 
in the same subject class. We see that for all networks 



At oc N^, with £ ~ 1 for random and £ ~ 0.5 for the 
MC-oo case (Fig. [I] (d,e,f)). Thus the exponent £ clearly 
differentiates the explosive transition from random-link 
percolation. Further, for all three networks, £ ~ 1 for 
the MC-2, resembling an ordinary percolation transition. 
However, for MC-10, the scaling exponent behaves dif- 
ferently for the three networks. For the MPC and SCA 
networks, £ ~ 0.5, indicating explosive percolation. For 
the CA, at first it appears that the data points do not fol- 
low scaling. However, a closer inspection shows that they 
cluster around two straight lines with £ ~ 1 and £ ~ 0.5. 
Indeed, for subnetworks with large collaborations (e.g., 
hep-ex, hep-ph) £ ~ 1, whereas for other subject classes 
(e.g., cond-mat, math-ph), £ ~ 0.5. 

In addition, we have performed a finite-size scaling 
analysis of the order parameter s max /N [T5]. The scaling 
relation for s max /7V is given by 

^£ =N-P' v F[(t-t c )N 1 ' v I (1) 

where F is some universal function, r is the control pa- 
rameter, r c is the critical point of transition, j3 is the 
critical exponent of the order parameter and v that of 
the correlation length. We choose the critical value r c of 
the control parameter as the value of r where the suscep- 
tibility, i.e., average cluster size has its maximum. Note 
that t c could also be chosen as the point where the clus- 
ter size distribution becomes a power law [2]; however, 
since our range of network sizes includes fairly small net- 
works, this would be too inaccurate as in some cases there 
are not enough clusters for determining the shape of the 
distribution. 

For the mobile phone call (MPC) network [Fig[T](g)], 
we find that the scaling at r c of the order parameter 
s mH /JV yields a very small exponent /3/v ~ 0.03 for the 
MC-10 case, while for MC-2 and random percolation, 
the exponents are larger, (3/v ~ 0.14 and 0/v ~ 0.42, re- 
spectively. The exponents for the small collaboration co- 
authorship (SCA) network behave similarly [Fig[l](h)], 
with a low value /3/v ~ 0.06 for the MC-10 case, and rel- 
atively high values (3/v ~ 0.31 and fi/v ~ 0.70 for MC- 
2 and random percolation, respectively. In contrast, for 
the co-authorship (CA) network, the exponents have high 
values for all cases [Fig[l](i)], &jv ~ 0.13, P/v ~ 0.94 and 
(3/v ~ 1.15, for MC-10, MC-2 and random percolation, 
respectively. 

In order to compare our results to the existing liter- 
ature, we follow the relations for the critical exponents 
given in Ref 0: fi/v = ,9/(4/3 + 1). The value for the ex- 
ponent (3 ~ 0.0555 given in Ref [2] yields (3/v ~ 0.0455. 
This value is consistent with our observation that the 
transition for MC-10 is explosive in the MPC and SCA 
networks, while it is ordinary in the CA network. Note 
that such small but finite values of the exponent are con- 
sistent with a 2nd order transition; however, because we 
are dealing with single, finite-size networks, we cannot 
make definite conclusions. Further, in all the three sys- 
tems MC-2 behaves similar to the ordinary random per- 
colation. 
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FIG. 1. Variation of the relative size of giant component, s max /iV, with scaled number of inter-cluster edges t/N for the (a) 
MPC (b) CA and (c) SCA network. The corresponding variations in the gap, A = t(N/2) — r(yN), as a function of system 
sizes are shown in (d), (e) and (f), for the Random, MC-2, MC-10 and MC-oo rules. Solid lines indicate fitted scaling exponents 
£. The variation of the order parameter, s max /N as a function of the system size N is shown for (g) MPC (h) CA and (i) SCA 
network. For each system the order parameter is calculated at the critical point. The solid line indicates the best fit obtained 
and the exponent ft /v. All curves are averaged over 10 runs. 
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FIG. 2. Cluster size distributions around the critical r c for 
the MPC, for MC-2 (a), MC-10 (b), and MC-oo (inset). 



Thus our percolation analysis on CA and SCA net- 
works reveals a difference between collaboration struc- 
tures in different fields. One possible explanation is the 
broad degree distribution for the CA network, whose tail 
can be approximated with a power law with exponent 
~ 1.7 in contrast to SCA, which decays as ~ 4.3. Hence, 
in this respect, the SCA network structure resembles the 
social network of the MPC. Further, it is clear that the 
nature of the transition depends both on the number of 
edges m considered in the percolation process and struc- 
tural features of the network. 

For the rest of this paper we focus only on the complete 
MPC and SCA networks, and first study their cluster size 
distributions around the critical point, r c . For the follow- 
ing, we have chosen r c as the point at which P(s) is a 
power law for the full region of s [2]. The complete net- 
works are large enough to choose r c this way, giving us 
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FIG. 3. Variation of the overlap, edge weight, and modu- 
larity as a function of the fraction of links added, for MPC 
(a,b,c) and SCA (d,e,f). The shaded area denotes the non- 
percolating regime for MC-10. 



in this case a more precise value than the susceptibility 
peaks. Then, we sweep the value of r around this point 
and monitor the distribution of cluster sizes. Fig. [5] (a, b) 
shows the cluster size distributions P(s) around r c for the 
MCP, for MC-2, MC-10, and MC-oo. For MC-2, P(s) 
behaves as usual for ordinary percolation, becoming a 
power law at r c and then turning exponential. For MC- 
10, the situation is different: for r < r c , there is a bump 
in the tail of the distribution, in line with theoretical pre- 
dictions for explosive percolation [5] . Immediately above 
t c , the smallest remaining clusters get depleted from the 
distribution as they are the first to join the giant cluster. 
For the semi-deterministic MC-oo (inset), the cluster size 
distribution resembles exponential for r < r c . The clus- 
ter size distributions for SCA are qualitatively similar. 



Percolation clusters, weight-topology 
correlations, and communities 



sure quantifies the extent by which two connected nodes 
share their neighborhoods: if i and j have no common 
neighbors, then = 0, and if i and j share all of their 
neighbors, Oij = 1. Thus if there are dense communities 
in the network, links inside the communities have high 
values of overlap, whereas links acting as "bridges" con- 
necting separate communities have low overlap values. 

Fig. [3] (a) displays the results for the MPC network. As 
expected, for random link addition, the overlap and the 
time when edges are added in the percolation process are 
uncorrelated. For MC-10 and MC-oo, edges with high 
overlap and weight are added first [Fig. [3] (a,b)]. This in- 
dicates that dense regions of the network, i.e. communi- 
ties, get percolated first. Both quantities show an abrupt 
drop at the transition point. This fits well with the Gra- 
novetterian weight-topology correlations observed ear- 
lier [11]. However, the behavior of the SCA network is 
different. Although high-overlap edges are added first 
[Fig. [3 (d)] , their weights are low [Fig. [3] (e)] . This points 
towards fundamentally different weight-topology correla- 
tions, where strong links act as bridges between commu- 
nities of weaker links. A likely explanation is that com- 
munities organize around senior scientists (hubs), with 
whom junior researchers are linked. The latter has a 
small number of joint publications with the local hubs, 
as they are only temporarily connected. The hubs, in 
turn, are linked via long-lasting collaborations and many 
co-authored papers. 

The relationship to community structure is confirmed 
with the behavior of the modularity 19J of percolation 
clusters, defined as 



M=Y J [(LclL)-{d c /2Lf]. i 



(3) 



where the sum runs over clusters, L is the number of 
links in the network, L c is the number of links within 
cluster c, and d c is the sum of the degrees of nodes in 
c. High values of M. correspond to a good community 
partition - hence, a high value of modularity calculated 
for percolation clusters indicates that they match well 
with communities. As for the other quantities, we calcu- 
late M. as a function of the fraction of links added /links- 
As seen in [Fig. [3] (e,f)], the peak of M. and the follow- 
ing sharp transition match the transition points well for 
MC-10. For the semi-deterministic MC-oo, the peak also 
matches the percolation point although the transition is 
less sharp. 



Next, we investigate the evolution of the percolation 
clusters and their relationship to communities and the 
weight-topology correlations. We study the overlap of 
the neighborhoods of endpoint nodes i and j of a link, 
defined as 

Oij = riij/(ki - 1 + kj — 1 - Ry), (2) 

where is the number of neighbors common to both 
nodes, and ki and kj are their degrees [TT]. This mea- 



C. Analysis of network model with communities 

It appears that the explosive percolation process fol- 
lows community structure when applied to a network 
where such structure exists. Communities in real-world 
networks are, however, hard to define unambiguously, 
and therefore we turn to a simple model with built-in 
community structure [19, 20J . In this model, N nodes are 
arranged into M communities of equal size, and edges are 
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FIG. 4. (a) Occupied (red) and unoccupied (blue) edges 
before the critical point in the model network with the MC- 
10 rule. Here, N = 100, M = 10, k m = 9.6 and fc out = 

0. 4. (b) The fraction of intra-community links /j n during the 
percolation process normalized by the average fraction (/in) 
for random link addition, (c) Matching quality Q against 
largest cluster size, s ma x for the model network, (d) Maximum 
of the quality, Q max , as a function of fc out . All curves are 
shown for the model with N = 4096, M = 128 and k out +k in = 
16. For (b) and (c) fc out = 1. 

placed at random such that on average each node has fc; n 
intra-community links and fc ou t inter-community links. 
When applying the MC-m sum rule to this network, we 
find that mostly intra-community edges are occupied be- 
fore the transition point [Fig. [4] (a)]. We quantify this 
by measuring the fraction of intra-community links that 
have been added during the process, normalized by the 
respective fraction for random link addition. It is evident 
from Fig.[4](b) that the MC rules prefer intra-community 
links early on in the process, and inter-community links 
only get added towards the end. 

To quantify the match between percolation clusters 
and the model communities, we consider the confusion 
matrix with elements 

n hk , =|C fe nC;,|, VfcX (4) 
where C k is the fc-th cluster and C, , is the k -th com- 

K k 

munity. Hence, the element n kk i represents the number 
of nodes in the intersection of cluster C' k and community 

There is a perfect match if clusters are subsets of com- 
munities and vice versa, i.e. clusters equal communities. 
The extent to which clusters are subsets of communities 
can be measured by the projection number of C on C , 
defined as 

Pc(C')=Vmaxn a -, (5) 

k k 

1. e., the sum of the maximum of each row in the confusion 



matrix. pc{C ) increases with cluster size, reaching its 
maximum when there is a single cluster that overlaps 
with all communities. For the reverse case, communities 
as subsets of clusters, one can define a similar projection 
number p C '(C), i.e., the sum of the maximum of each 
column in the matrix. This number is maximized when 
the clusters are as small as possible, i.e. single nodes, and 
decreases with increasing cluster size |21) . The quality 
of matching can now be quantified with the normalized 
average of both projection numbers, 

Q=[pc(C')+p C '(C)}/2N, (6) 

reaching its maximum when the match between clusters 
and communities is optimal. 

Fig. [1] (c) shows the behavior of Q for the model net- 
work as a function of the size of the largest observed 
cluster s max spanned by the added links. Here we use 
the largest cluster size s max instead of /u n ks because this 
provides us with a more detailed view on what happens 
around the transition point; the cluster sizes change only 
a little beyond this region. It is seen that Q initially in- 
creases and then decreases as a function of s max , reach- 
ing its maximum before the formation of the giant com- 
ponent and merging of clusters. The percolation clus- 
ters coincide well with the model communities below and 
around r c compared to random link addition. We next 
study the behavior of the maximum of quality Q max as we 
make the community structure more smeared-out by in- 
creasing fc ou t while keeping the average total degree fixed 
[Fig. [4] (d)]. Although Q max decreases as k out increases 
for both the MC-10 and random addition, its higher value 
for the MC-10 process indicates better match with the 
built-in communities. 

We also obtain qualitatively similar results by using 
normalized the mutual information (NMI) instead of the 
matching quality Q (not shown). The mutual informa- 
tion [52] can be defined using the confusion matrix as 

ft- N nkHk ' 

where n k = J2k' n kk' an< ^ n k' ~ J2k n kk' are * ne s ' ze °f 
the fc-th community and k -th cluster, respectively. The 
normalized mutual information is then defined as 

where H(C) = —^ k n k /N\og(n k /N) is the entropy of 
the community C, and H(C ) is the entropy of the clus- 
ter C . In our case, where we have large number of small 
communities, the NMI does not however work as well as 
the matching quality. This is because the NMI values are 
high already at the beginning of the percolation process 
when all the nodes are isolated forming their own clus- 
ters. In this case, NMI{C,C') = 2(g£ + which 
approaches 1 if the model network size is increased keep- 
ing the community sizes, N/M fixed. In contrast, the 
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initial value of the quality is Q — | + ^ , which is inde- 
pendent of the number of communities. 

IV. SUMMARY AND CONCLUSIONS 

To summarize, we have shown that the Achlioptas pro- 
cedure can give rise to an explosive percolation transition 
when the rules are applied to empirical real-world social 
networks. We have used a variant of the Minimum Clus- 
ter (MC) rule, where the number of links compared dur- 
ing the link addition process is a parameter, and shown 
that both the network structure and the number of links 
compared have an influence on the universality class (or- 
dinary or explosive) of the observed percolation transi- 
tion. In order to show this, we have carried out finite-size 
scaling using subnetworks, chosen on the basis of known 
external properties of the empirical networks. This is an 
important but non-trivial task when percolation analysis 
is applied to empirical networks where only a single "re- 
alization" is available. The resulting values for critical 
exponents are in line with the view that the explosive 
percolation transition is in fact second order; however, 



one cannot make definite conclusions since we are deal- 
ing with singe, finite-size networks. 

In addition, we have illustrated a connection between 
links selected by the MC rule during the percolation pro- 
cess and community structure - at the critical point, 
the cluster structure arising from the application of the 
MC rule reflects the community structure of the network. 
This is confirmed by the analysis of single-link properties 
(the overlap, link weight), and modularity for the empir- 
ical networks, and by detailed studies of the match be- 
tween clusters and built-in community structure of model 
networks. 
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